The Wolfram Language provides several data structures for representing chemical species at different levels of granularity.
Start with a small
BioSequence
representing a single codon.
In[42]:=
bioseq=BioSequence["RNA","AUG"]
Out[42]=
BioSequence
Type: RNA Sequence
Content: AUG 
(3 letters)

From this bio sequence you can create a
ChemicalFormula
or a
Molecule
depending on your application.
In[54]:=
form=ChemicalFormula@bioseq
Out[54]=
C
29
H
36
N
12
O
19
P
2
In[53]:=
mol=Molecule@bioseq
Out[53]=
Molecule
Formula:
C
29
H
36
N
12
O
19
P
2
Atoms:
98
Bonds:
105

Equivalence between different representations can be checked easily using
MoleculeMatchQ
.
In[57]:=
{MoleculeMatchQ[mol,bioseq],MoleculeMatchQ[mol,form]}
Out[57]=
{True,True}
As the simplest representation, the formula allows you to find molecular mass and elemental composition.
In[62]:=
form[{"MolecularMass","ElementCounts"}]
Out[62]=

918.6
u
,
carbon
29,
hydrogen
36,
nitrogen
12,
oxygen
19,
phosphorus
2
The molecule represents all atoms and bonds explicitly and allows computing topological properties or even generating a 3D structure.
In[63]:=
mol[{"AromaticRingCount","HBondDonorCount"}]
Out[63]=
{5,11}
In[64]:=
MoleculePlot3D[mol,PlotTheme->"Spacefilling"]
Out[64]=
The bio sequence representation allows computation at a higher level of abstraction. Convert this sequence into DNA or into a peptide.
In[66]:=
BioSequenceTranscribe[bioseq]
Out[66]=
BioSequence
Type: DNA Sequence
Content: ATG 
(3 letters)

In[65]:=
BioSequenceTranslate[bioseq]
Out[65]=
BioSequence
Type: Peptide Sequence
Content: M 
(1 letter)
